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Abstract 

Finding cliques in random graphs and the closely related “planted” clique variant, where a 
clique of size k is planted in a random G(n, 1/2) graph, have been the focus of substantial study 
in algorithm design. Despite much effort, the best known polynomial-time algorithms only solve 
the problem for k = Q{y/n). 

In this paper we study the complexity of the planted clique problem under algorithms from 
the Sum-Of-Squares hierarchy. We prove the first average case lower bound for this model: for 
almost all graphs in G(n, 1/2), r rounds of the SOS hierarchy cannot find a planted fc-clique 
unless k > (Cri/logn)^/’'/G’'. Thus, for any constant number of rounds planted cliques of size 
cannot be found by this powerful class of algorithms. This is shown via an integrability 
gap for the natural formulation of maximum clique problem on random graphs for SOS and 
Lasserre hierarchies, which in turn follow from degree lower bounds for the Positivestellensatz 
proof system. 

We follow the usual recipe for such proofs. First, we introduce a natural ’’dual certificate” 
(also known as a ’’vector-solution” or ’’pseudo-expectation”) for the given system of polynomial 
equations representing the problem for every fixed input graph. Then we show that the matrix 
associated with this dual certificate is PSD (positive semi-definite) with high probability over the 
choice of the input graph.This requires the use of certain tools. One is the theory of association 
schemes, and in particular the eigenspaces and eigenvalues of the Johnson scheme. Another is 
a combinatorial method we develop to compute (via traces) norm bounds for certain random 
matrices whose entries are highly dependent; we hope this method will be useful elsewhere. 


1 Introduction 

1.1 The problem and main result 

Finding cliques in random graphs has been the focus of substantial study in algorithm design. Let 
G{n,p) denote Erdos-Renyi random graphs on n vertices where each edge is kept in the graph with 
probability p. It is easy to check that in a random graph G •(— G(n, 1/2), the largest clique has 
size (2 + o(l))log 2 U with high probability. On the other hand, the best known polynomial-time 
algorithms can only find cliques of size (1 -|- o(l)) log 2 n and obtaining better algorithms remains a 
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longstanding open problem: Karp [Kar76] suggested that even finding cliques of size (1 + e) log 2 n 
could require superpolynomial time. 

Motivated by this, much attention has been given to the related planted clique problem or 
hidden clique problem introduced by Jerrum [Jer92] and Kucera [Kuc95]. Here, we are given a 
graph G •(— G{n, 1/2, k) generated by first choosing a G{n, 1/2) random graph and placing a clique 
of size k in the random graph for t ^ log 2 n. The goal is to recover the hidden clique for as small 
a k as possible given G. The study of the planted clique problem and its variations (like finding 
planted dense subgraphs) is motivated from several other more recent directions. Its potential as 
being hard on average has lead to proposals to base crypto systems on variants of it [ABWIO]. It 
was used to argue that testing A:-wise independence is hard near the information theoretic limit 
by [AAK'''07]. It is used in [ABBGIO] to argue that evaluating some financial derivatives is hard. 
It was also used to justify the hardness of sparse principal component detection by Bethet and 
Rigollet [BR13]. Another source of interest comes from the related algorithmic problem of finding 
large communities in social networks. The best known polynomial-time algorithms can solve the 
problem for k = Q{y/n) [AKS98] (see [DGGP14] for a near linear-time algorithm) and improving 
on this bound has received significant attention. The algorithmic problem has also been of much 
interest in the context of signal finding in molecular biology (pattern discovery in DNA sequences) 
as modeled in the work of [PS'’'00]. 

In this work we exhibit a lower bound for the problem in the powerful Lasserre [LasOl] and 
“sum-of-squares” (SOS) [ParOO] semi-definite programming hierarchies^. As it happens, proving 
such lower bounds for the planted clique problem reduces easily to proving an integrality gap of 
value k for the natural formulation of the maximum clique problem in these hierarchies on G{n, 1/2) 
graphs. Our main result then is the following average-case lower bound for maximum clique. We 
defer the formal definition of the semi-definite relaxation and hierarchies for now, and only note a 
few facts. First, that implementing the rth level of the SOS hierarchy (namely, r rounds), takes 
roughly time, which is polynomial for constant r. Second, the above algorithm for k = Q{^/n) 
may be viewed as implementing only one round. Third, that r = logn suffices for exact solution of 
the problem, namely hnding the maximum clique. Our lower bound implies that polynomial time 
(when the number of rounds r is constant) cannot handle even k = and that as many as 

(logn)^/^ rounds cannot handle k = (logn)^*'^\ Here are more precise statements^. 

Theorem 1.1. With high probability, for G ^ G{n, 1/2) the natural r-round SOS relaxation of the 
maximum clique problem has an integrality gap of at least n^/^^/C'^(logn)^. 

As a corollary we obtain the following lower bound for the planted clique problem. 

Corollary 1.2. With high probability, for G •(— G{n, 1/2, t) the natural r-round SOS relaxation of 
the planted clique problem has an integrality gap of at least n^/^^/tC'^(log n)^. 

1.2 Background and related work 

Linear and semi-definite hierarchies are one of the most powerful and well-studied techniques in 
algorithm design. The most prominent of these are the Sherali-Adams hierarchy (SA) [SA90], 
Lovasz-Schrijver hierarchy (LS) [LS91], their semi-dehnite versions SA_|_, LS+ and Lasserre and SOS 

^For brevity, in the following, we will use SOS hierarchy as a common term for the formulations of Lasserre [LasOl] 
and Parrilo [ParOO] which are essentially the same in our context. 

^Throughout, c, C denote constants. 


2 



hierarchies. The hierarchies present progressively stronger convex relaxations for combinatorial 
optimization problems parametrized by the number of rounds r, where the r-round relaxation 
can be solved in time on instances of size n in all of them. In terms of relative power 

(barring some minor technicalities about how the numbering of rounds starts), it is known that 
LS+(r) < SA+(r) < SOS(r). Because they capture most powerful techniques for combinatorial 
optimization, lower bounds for hierarchies serve as strong unconditional evidence for computational 
hardness. Such lower bounds are even more relevant and compelling in situations where we do not 
have NP-hardness results, as is the case for typical average-case optimization problems. 

Broadly speaking, our understanding of the SOS hierarchy is more limited than those of LS-|_ and 
SA+ hierarchies and in fact the SOS hierarchy appears to be much more powerful. A particularly 
striking example of this phenomenon was provided by a recent work of Barak et al. [BBH'*‘12]. 
They showed that a constant number of rounds of the SOS hierarchy can solve the much studied 
unique games problem on instances which need super constant number of LS_|_, SA+ rounds. It was 
also shown by the works of [BRSll, GSll] that the SOS hierarchy captures the sub-exponential 
algorithm for unique games of [ABSIO]. These results emphasize the need for a better understanding 
of the power and limitations of the SOS hierarchy. 

From the perspective of proving limitations, all known lower bounds for the SOS hierarchy 
essentially have their origins in the works of Grigoriev [GriOlb, GriOla], some of which were later 
independently rediscovered by Schoenebeck [SchOS]. These works show that even 0(n) rounds of 
SOS hierarchy cannot solve random 3XOR or 3SAT instances, implying a strong unconditional 
average-case lower bound for a natural distribution. 

Most subsequent lower bounds for SOS hierarchy such as those of [Tul09], [BGV'''12] rely on 
[GriOlb] and [SchOS] and gadget reductions. For example, Tulsiani [Tul09] shows that 
rounds of SOS has an integrality gap of for maximum clique in worst-case. This is in 

stark contrast to the average-case setting: even a single round of SOS gets an integrality gap of 
at most 0{y/n) for maximum clique on G(n, 1/2) [FKOO]. Thus, the worst-case and average-case 
problems have very different complexities. Finally, using reductions tend to induce distributions 
that are far from uniform and definitely not as natural as G(n, 1/2). 

For max-clique on random G(n, 1/2) graphs, Feige and Krauthgamer [FKOO] showed that 
LS+(r), and hence SOS(r), has an integrality gap of at most with high probability. 

Complementing this, they also showed [FK03] that the gap remains ^fnjT' for LS-|_(r) with high 
probability. However, there were no non-trivial lower bounds known for the stronger SOS hierarchy. 

For the planted clique problem, other algorithmic techniques were studied. Jerrum [Jer92] 
showed that a broad class of Markov chain Monte-Carlo (MCMC) based methods cannot solve the 
problem when the planted clique has size for any constant 5 > 0. Another approach for 

the planted clique problem based on optimizing a third order tensor was suggested by Frieze and 
Kannan [FK08] . However, the corresponding optimization problem is NP-hard in the worst-case. 

In a recent work, Feldman et al. [FGR'*“13] introduced the framework of statistical algorithms 
which generalizes many algorithmic approaches like MCMC methods and showed that such algo¬ 
rithms cannot find large cliques when the planted clique has size in less than 

time^. However, their framework seems quite different from hierarchy based algorithms. In partic¬ 
ular, the statistical algorithms framework is not applicable to algorithms which first pick a sample, 
fix it, and then perform various operations (such as convex relaxations) on it, as is the case for the 

®The results of [FGR’''13] actually apply to the harder bipartite planted clique problem, but this assumption is 
not too critical. 
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hierarchies above. 

Meka and Wigderson [MW13] addressed SOS lower bounds for planted clique and claimed a 
stronger bound than Thm 1.1. While there was a fatal error in their proof, many of the techniques 
introduced there are used in the present paper. 

Independent of our work, Deshpande and Montanari [DM15] recently gave a degree 4 SOS lower 
bound for planted clique; while they are only able to handle the degree 4 case (i.e., r = 2) , they 
obtain a better bound for this case than us (roughly vs n}!^ as we do). 

1.3 Proof systems and SDP hierarchies 

A potentially simpler problem than deciding is a large clique exists is the problem of producing 
short certificates to the non-existence of such cliques. This puts the problem in the realm of proof 
complexity. Indeed, we approach the problem of SOS lower bounds from this viewpoint, via the 
positivstellensatz proof system perspective of Grigoriev and Volobjov [GVOl]. We explain this 
proof system next in general, and then specialize to Boolean problems and specifically to planted 
clique. 

Suppose we are given a system of polynomial equations or “axioms” 

fi{x) = 0, / 2 (x) = 0, ..., fm{x) = 0, 

where each /j : M”’ —)• M is a n-variate polynomial. A positivstellensatz refutation of the system 
T = ((/i)) is an identity of the form 

m N 

i=l i=l 

where {^i, ..., Qm} and {/ii, ..., /iat} are arbitrary n-variate polynomials. Clearly, if there exists 
an identity as above, then the system F has no solution over reals. Starting with the seminal work 
of Artin on Hilbert’s seventeenth problem [Art27], a long line of important results in real algebraic 
geometry - [Kri64, Ste73, Put93, Sch91]; cf. [BCR98] and references therein - showed that, under 
some (important) technical conditions'^, such certifying identities always exist for an infeasible 
system. This motivates the following notion of complexity for refuting systems of polynomial 
equations. 

Definition 1.3 (Positivstellensatz Refutation, [GVOl]). Let F = {/i,...,/n : M” —>• M}, be a 
system of axioms, where each fi is a real n-variate polynomial. A positivstellensatz refutation of 
degree r ('PS(r) refutation, henceforth) for F is an identity of the form 

m N 

Y^fiQi^l + '^hff, (1.1) 

i=l i=l 

where gi,..., g^, hi,, h]\f are n-variate polynomials sueh that deg{figi) < 2r for all i £ [m] and 
deg{hj) < r for all j G [N]. 

^We avoid going into the details here as the conditions are easily met in the presence of Boolean axioms. 
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Our interest in positivstellensatz refutations as above comes from the known relations between 
such identities and SOS hierarchy. Informally (and under appropriate technical conditions), identi¬ 
ties as above of degree r show that SOS hierarchy can certify infeasibility of the axioms in 2r-|-0(l) 
rounds and vice versa. We will focus on showing degree lower bounds for identities as above and 
use them to get integrality gaps for the the SOS hierarchy. We formalize this in Section 12. For a 
brief history of the different formulations from [GVOl], [LasOl], [ParOO] and the relations between 
them and results in real algebraic geometry we refer the reader to [OZ13]. 

Given the above setup, we shall consider the following set of natural axioms to test if a graph 
G has a clique of size k. 

Definition 1.4. Given a graph G, let Chque(G, A:) denote the following set of polynomial axioms: 

(Max-Clique): xj — Xi, Vi € [n] 

Xi ■ Xj, V pairs {i,j} ^ G (1.2) 

y^^Xj- k. 

i 

Here, the equations on the first line are Boolean axioms restricting feasible solutions to be in 
{0,1}”. The equations on the second line constrain the support of any feasible x to define a clique 
in G. Finally, the equation on the third line specihes the size of support of x. Thus, for any graph 
G, Clique(G, k) is feasible if and only if G has a clique of size k. Our core result is to show lower 
bounds on positivstellensatz refutations for Chque(G, A:). 

Theorem 1.5 (Main). With high probability over G •(— G'(n, 1/2), the system Chque(G, A:) defined 
by Equation 1.2 has no PS(r) refutation for k < n^/^^/C'^(log n)^/’' 

Given the above theorem it is easy to deduce the integrality gap for the SOS hierarchy. The¬ 
orem 1.1: see Section 12. We next highlight the outline of the proof, and some of our techniques 
which may be of broader interest. 

1.4 Outline 

We now give an outline of our arguments. As in most previous works (cf. [GriOla], [GriOlb], 
[SchOS]) on showing lower bounds for PS(r) refutations, our main tool will be a dual certificate. 
We note that in the context of hierarchies above, this object is called either a vector solution^, 
or pseudo-expectation^. We now turn to define this important notion, which arises naturally from 
using duality to prove that a degree r refutation like 1.1 does not exist. Let V{n, 2r) : M” —)■ M be 
the set of n-variate real polynomials of total degree at most 2r. 

Definition 1.6 (PSD Mappings). A linear mapping A4 : V(n,2r) —>■ M. is said to he positive 
semi-definite (PSD) if Ai{P'^) > 0 for all n-variate polynomials P of degree at most r. 

Definition 1.7 (Dual Gertificates). Given a set of axioms fi,...,fm, a dual certificate for the 
axioms is a PSD mapping A4 : P(n,2r) —)• M such that M.{fig) = 0 for all i G [m] and all 
polynomials g such that deg{fig) < 2r. 

®in which numerical values to variables are replaced by vector values 

®reflecting the view of these values as moments of a (possibly nonexistent) probability distribution 
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Under reasonable technical conditions which ensure strong duality, the converse also holds. For 
the clique axioms from Equation 1.2, a dual certificate would correspond to a feasible vector solution 
for the r-round SOS relaxation for maximum clique (see Figure 12 for the exact formulation) with 
value k. 

The following elementary lemma will be crucial. 

Lemma 1.8 (Dual Certificate). Given a system of axioms ({fi)), there does not exist a PS(r) 
refutation of the system if there exists a dual certificate A4 :V{n, 2r) —^ M for the axioms. 

The existence of such a mapping trivially implies a lower bound for PS(r) refutations: apply 
M. to both sides of a purported PS(r) identity as in Equation 1.1 to arrive at a contradiction. 

The lemma suggests a general recipe for proving PS(r) refutation lower bounds: 

• Design a dual certihcate M: For the clique axioms we care about, it is easy to figure out what 
the right dual certificate Ai “should be” by working backwards from the axioms. The same 
happens also for the PS(r) refutation lower bounds of [GriOla, GriOlb]. The main hurdle 
then is to show that the obtained mapping Ai is indeed PSD. At a high level, this reduces 
to proving a certain random matrix M E M(r)^(7-) ig PSD. We show that M is PSD in three 
steps. 

• Reduction to PSDness of another matrix M': The matrix M has many zero rows and columns 
which makes it difficult to work with. In Section 5 we fix this by filling in the zero rows and 
columns of M to obtain a new matrix M' . We then argue that to show M is PSD it is 
sufficient to show that M' is PSD. 

• (Deterministic) Matrix analysis: E = E[M'] is PSD with a large minimum eigenvalue 
Xmin{E). We show this statement in Section Section 7 by using the theory of association 
schemes described below. 

• Large deviation: with high probability, \\M' — E\\ < Xmin{E). This is done by using the 
structure of our matrix M' along-with a careful application of the trace method to bound the 
norms of certain random matrices with dependent entries. 

We note here the main techniques used. 

Techniques: Association schemes As discussed, the essence of proving Theorem 1.5 involves 
showing that a certain random matrix is positive semi-definite (PSD) with high probability. In 
our case, this calls for showing a relation of the form A -< for two matrices A, B whose rows 
and columns are indexed by subsets of [n] of size r. This in turn leads us to matrices which 
though complicated to describe, will be set-symmetric - the entry defined by any two (row and 
column) sets I, J depends solely on the size of the intersection / n J. The set of all such matrices, 
called the Johnson scheme, is quite well studied in combinatorics as a special case of association 
schemes. In particular, all such matrices commute with one another and their common eigenspaces 
are completely understood. This theory allows us to estimate the eigenvalues and norms of various 
matrices that arise in the analysis. 

^Here and henceforth ^ denotes PSD ordering: A ^ B \i and only if B — T is positive definite. 
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Techniques: Trace bounds for locally random matrices After various simplifications and 
reductions, a central problem we have to deal with is upper bounding the spectral norm of certain 
random matrices, defined by the underlying random graph G ^ G{n, 1/2). As above, these matrices 
have rows and columns indexed by subsets of vertices. The entry (J, J) of the matrix will be a 
random variable of expectation zero, which depends only on the edges and non-edges of G in the 
subgraph induced by 7 U J (hence we name such matrices local). In the simple case when r = 1 
(so rows and columns are indexed by singletons), which is the one studied in the analysis of the 
y/n approximation algorithm, the random variables in all entries are mutually independent, and a 
norm bound is easy to obtain by a straightforward use of the trace method. However, for r > 1 as 
we need to handle, the entries of the matrix are dependent whenever the edge sets of their entries 
intersect. This significantly complicates the trace calculation, and we develop some combinatorial 
tools to bound the trace of high powers of such local matrices. 


2 Dual certificate for PS(r) refutations of max-clique 

We will specify the dual certificate Ai by defining it for polynomials where each individual variable 
has degree at most 1 and extend Ai multi-linearly to all polynomials: for any polynomial P, 
Ai{P) = Ai{P) where P is obtained from P by reducing the individual degrees of all variables to 
1. We can do this without loss of generality because of the Boolean axioms. 

As mentioned in the introduction, we can often work out what the dual certificate should be 
from the axioms and basic linear algebra. As an example, we first work out the case where the 
graph G is the complete graph; this will also help us draw a concrete connection to the work of 
[GriOla]. 


2.1 Complete graph and knapsack 


For complete graph, the clique axioms simplify to 


(Max-Clique): xj — Xi, Vi G [n] 

y^^Xj- k. 


These incidentally also correspond to proving lower bounds for knapsack as studied by Grigoriev 
[GriOla] (and was what lead us to the specific dual certificate we study). However, in the context 
of lower bounds for knapsack, the axioms are mainly interesting for non-integer k and Grigoriev 
shows that for non-integer k < n/2, the above system has no PS(r) refutation for r < k. 

The above axioms tell us that any candidate dual certificate Aicr V{n,2r) —)■ M should 
satisfy: 


Ai Gr 




0, V7, |7| < 2r. 


For 7 C [re], let A/ = Hje/ ^ the above equation is symmetric, it is natural to assume that 

Aicr is also symmetric in the sense that AioriAi) = /(I7j) for some function / : {0,..., 2r} — 
M+. Working from this assumption, Grigoriev derives the following recurrence relation for / : 
{0,..., 2r} —)• M+, 


/(i + 1) = -— 
n — i 
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From the above it follows that we can dehne / and hence A4 as follows: 


MGr(Xj) = fHH) 


k(k-i)---(k-iH) 
^ n(n-l)---(n-|/|) 


Grigoriev takes /(O) = 1. Here we set /(O) = (^) with a view towards what is to come. Thus, the 
hnal certihcate is 


M.Gr{Xl) 


fn\ fc(fc-l)---(fc-|/|) 
y2ry n(n — 1) • • • (n — |/|) 


Grigoriev shows the following: 


(n-\I\\ (|J|) 


( 2 . 1 ) 


Theorem 2.1 ([GriOla]). For k < njl, the mapping M-Gt 


defined above is PSD for r < k. 


2.2 Certificate for clique axioms 

Following a similar approach, we now derive the dual certificate for the clique axioms from Equations 
1.2, which we restate below for convenience: given a graph G on n vertices, k < n, the axioms of 
Ghque(G, k) are 

(Max-Clique): xj — Xi, Vi G [n] 

Xi ■ Xj, V pairs {i,j} fz G (2.2) 

y^^Xj- k. 

i 

The above axioms tell us that any candidate dual certificate M. = M.g '■ 'Pin, 2r) ^ M should 
satisfy: 


(Xj) = 0, V/, |/| < 2r, I is not a clique in G, 
M ( (^x,-k] Xr) =0, VI, |/| <2r. 


^ 2 = 1 


(2.3) 


The above equations give us a system of linear equations that Ai needs to satisfy. By working 
with the equations, it is easy to guess a natural solution for the system. 

Given a graph G on [n], and / T [n], |/| < 2r, let 


degG{I) = K*? C [n] : / C S', |S| = 2r, S is a clique in G}|. 

For instance, if r = 1 and v £ G, then degGi{v}) is the degree of vertex v. 

We dehne Ai = AIg ^ Pin, 2r) —)• M for monomials as follows: for I C [n], |/| < 2r, let 


A 4 ( I = degGil) ■ 


\iei 


kik-l)---ik-\I\ + l) 
2r(2r-l)---(2r-|/| + l) 


= degG(/) • 




(2.4) 


It is easy to check the following claim: 

Claim 2.2. For any graph G, Ai = A4g defined by Equation 2.f satisfies Equations 2.3. 







Proof. The first equation in Equation 2.3 follows immediately from the definition of Ai. Now, for 
JC [n],|/| <2r, 


M 


- A: ) X(/) ] = (|/| - k)M{X{I)) + Y,M{X{I U {j})) 

\ i / / j 0 


Q 


(m«) 


= (1^1 - *:) ■ degaU) ■ ^ U {i)) 


(ii) 

UP 

UP 


m 


UP 


-(2r- |/|) - degcO) + E 
it' 


Observe that our notion of degree, degc, satisfies the following recurrence: for |/| < 2 r 
1 1 


degcil) = 


2r - |/| 


E 


degcil U{j}) = 


2 r - 


j^I, j adjacent to all of I 

The above two equations imply that Ai satishes the second equation in 2.3. 


^degcil U {j}). 


□ 


Thus, to prove our main theorem Theorem 1.5, it suffices to show that A\ as defined above is 
PSD with high probability. We now argue that in fact, to show that A\ is PSD we do not need to 
consider all polynomials P of degree at most r. Rather, it is sufficient to show that AdiP^) > 0 
whenever Pi is multilinear and homogeneous of degree r. 

Lemma 2.3. For any P of degree at most r we may write P = Pi + P 2 iixj — Xi)-\-P'iiYfi ** “ 
where Pi is multilinear and homogeneous of degree r, P 3 has degree at most r — 1, and all P 2 i have 
degree at most r — 2. 

Proof. We first make P multilinear by removing any terms which are not multilinear from P as 
follows. If P has a term of the form xjf where / has degree at most r—2, write xff = ixj—Xi)f+Xif. 
Iteratively applying this procedure, we may write P = P' plus terms of the form [xf — Xi)f where 
P' is multilinear of degree at most r and / has degree at most r — 2. 

We now make P' multilinear and homogeneous of degree r by removing any terms which have 
lower degree as follows. If P' has a term of the form Xj where |/| < r, write 

_ IJI ^ + 7 _ iji ^ 

' ' is/ ' ' 

Iteratively applying this procedure, we may write P = Pi plus terms of the form (x? — Xi)f and 
terms of the forms ~ where Pi is multilinear and homogeneous of degree r, all such / 

have degree at most r — 2 and all such g have degree at most r — 1. Putting everything together, 
the result follows. □ 



Corollary 2.4. If AiiPf) > 0 for all multilinear homogeneous Pi of degree r then Ai is PSD. 

Proof. Assume Ai{Pi) > 0 for all multilinear homogeneous Pi of degree r and A4(P^) < 0 for some 
P € P(u, r). Using Lemma 2.3, we may write P = Pi + X]* P 2 iixj — Xi) + P^iYi^i Xi — k) where Pi 
is multilinear and homogeneous of degree r. A4(P^) = A4(P^) so A^(Pi) < 0. Contradiction. □ 
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Thus, showing that A4 is PSD with high probability is equivalent to showing that the following 
matrix M = Mq £ ig PSD with high probability for G ^ G{n, 1/2): for I,j£ 


M(I,J) = deffG(/UJ)-|^. (2.5) 

In the remainder of the paper, we show that M is PSD with high probability for k < (log n)^/'’). 


Theorem 2.5 (Main Technical Theorem). There exists a eonstant c > 0 such that, with high 
probability over G •(— G(re, 1/2), the matrix Mg defined by Equation 2.5 is PSD for k < ■ 

{y/n/ logn)^/'’. 


3 Overview of proof of Theorem 2.5 


The proof of Theorem 2.5 is quite technical, and is broken into two parts, where the second part is 
further broken down into smaller parts. While we gave a sketch of the proof of Theorem 2.5 in the 
inroduction, we give a more detailed overview of the proof here. Recall that all matrices mentioned 
below are random matrices which are specified by the choice of the random graph G. 

As mentioned in the introduction, the matrix M = Mq has many zero rows and columns which 
makes it difficult to work with. The first part is to fill in the zero rows and columns of M to obtain 
a new matrix, M', which is nonsingular and has no high variance entries. In Section 5 we define 
this matrix M' and show that if M' is PSD, so is M. The idea is that M and M' are symmetric 
and the nonzero part of M is a principal submatrix of M', so the smallest nonzero eigenvalue of 
M is at least as large as the smallest eigenvalue of M'. 

The second part is to prove that M' is PSD (indeed we prove that it has a high positive smallest 
eigenvalue). This is stated in the main technical lemma Lemma 8.1. For the proof of Lemma 8.1 we 
decompose the matrix M' as M' = i? + L + A, where (a) E = E[M'] is the expectation matrix; (b) 
L will be a “local” random matrix such that for sets I, J, L{I, J) only depends on the edges among 
the vertices of / U J and (c) A is a “global” error matrix whose entries are small in magnitude. 

Having defined E (which is set-symmetric), let us spell out what the other matrices are. 
The “local” random matrix L is defined in a simple way as follows: 


L{I,J) 


—E{I, J) if some edge in £{I U J)\ {£{!) U £{J)) is missing from G 
/3(|/nJ|) otherwise 


where £{I) denotes the set of possible edges between vertices of I and fi : {0, ...,r} —)• M+ is 
suitably chosen so that each individual entry of L has expectation zero. 

Finally, define the last matrix A = M' — E — L. 

The proof that M' is PSD proceeds in three modular steps: 


1. We use the results about Johnson scheme to show that E >~ and has a large least eigenvalue 
(roughly Dj.(A:”n”)); see Section 7. 

2. We next show that ||L|| < log n by exploiting the recursive structure of the matrix 

L and some careful trace calculations. This is the most technically intensive part of the proof, 
and requires the development of some combinatorial tools to estimate the trace of high powers 
of L; see Section 8.2. 
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3. We then show that ||A|| < log n. This is done by first showing that ewer?/entry of 

A is small in magnitude, via concentration bounds on the number of cliques in random graphs, 
and bounding its norm using Gershgorin’s circle theorem (Lemma 4.1); see Section 8.3. 


4 Preliminaries 


We shall use the following notations®: 


1. V{n,2r) denotes the set of n-variate polynomials of degree at most 2r. 

2. PS(r) denotes positivstellensatz refutations of degree at most r as defined in Definition 1.3. 

3. A linear mapping A4 : P(n, 2r) —/• M is said to be positive semi-definite (PSD) if A4(P^) > 0 
for all P G P(n, r). 

4. For 0 < r < n, let (<]) denote all subsets of size exactly and at most r, respectively. 


5. For 0 < r < re, denotes matrices with rows and columns indexed by subsets of [re] 

X Z I'*]'l 

of size exactly r. Similarly, denotes matrices with rows and columns indexed by 

subsets of [re] of size at most r. 


/[n] N ^ ^[n] \ 

6 . We will view linear functionals Ai : P(re,2r) —)• M as matrices M G v<r/^ where for 

I,J& (<]), M/j = Ai (rise/uJ general, this correspondence is not bijective. However, 

as we only deal with mappings which are constant under multi-linear extensions throughout, 
the correspondence is one-to-one. It is a standard fact that a mapping Ai is PSD if and only 
if the matrix M is PSD. 


7. For I C [re], let Xj = Hie/**- 

8 . By default all vectors are column vectors. For a set I, 1(1) denotes the indicator vector of 
the set I. 

9. For a matrix A G G denotes its conjugate matrix. 


We will also need the following standard fact from matrix theory (see [GVL96] for instance). 
Lemma 4.1 (special case of Gershgorin circle theorem). For any square matrx M G , 


||M|| < max 
ie[Af] 


E IM.I j ^ 


Finally, we need McDiarmid’s inequality for obtaining tail bounds for functions of independent 
random variables (see [?] for instance) 

®Some are repeated from the introduction so as to have them at one place. 
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Theorem 4.2 (McDiarmid’s inequality). Let Xi,... ,Xn be independent random variables and let 
f be a function over the domain space of (Xi,... ,X„). Let ci,... ,Cn > 0 be such that for all i, 


\f {^1} ■ ■ ■ ) 1) ) • • • ) XfiJ /(xi, . . . , Xi—i, Xj, Xj^i, . . . , Xji') 1 ^ Cj. 

Then, for all t > 0, 

-2t^ 


Pr[|/(Xi,...,Xj-E[/]| >t] <2exp 


2-ji=l 


5 Reduction to PSDness of M' 

In this section, we define the matrix M' and show that if M' is PSD then so is M. We use the 
following notations for brevity: For any set I C [n], let £{I) = {{i,j} : i ^ j G /}. For 0 < i < r, 



For every T C [n], let Mt G with Mt{I, J) = /3(|/n J|) if lU J C T, and G contains 

every edge in £{T) \ £{I) U £{J) (i.e., the only edges in T missing in G are those with both end 
points in one of I or J). We will study the matrix 

M' = Mt. (5.2) 

T:\T\=2r 

Intuitively, for every I,J, M'{I,J) is what would be had we added cliques on the 

subsets /, J to the graph. The above definition avoids the problem of the whole row and column 
corresponding to / or J becoming zero if either was not a clique and controls the variance of the 
entries. We now show that to show M is PSD, it is sufficient to show that M' is PSD. 

Lemma 5.1. If M' is PSD then M is PSD. 

Proof. The reason this lemma is true is because as shown below, the nonzero part of M is a principal 
submatrix of M'. 

Proposition 5.2. Whenever I and J are cliques of size r in G, M'{I, J) = M(I, J) 

Proof. Suppose that I and J are cliques in G. Then, Mt{I, J) = I3{\I n J|) if I U J C T and T is 
a clique and 0 otherwise. Therefore, 

M'{I,J) = Y^t{I,J) = /3(|^n J|) • |{r : lu J c r,r clique}! = M{I,J). 

T 


□ 


Corollary 5.3. The nonzero part of XI is a principal submatrix of M'. 
We now use the following elementary fact about matrices. 
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Proposition 5.4. If A is a principal submatrix of a symmetric matrix B then the smallest eigen¬ 
value of A is at least as large as the smallest eigenvalue of B. 

Proof. Without loss of generality, ^ is an / x Z matrix and B is an mx m matrix where I < m. Let 
u £ be a unit eigenvector of A with minimal eigenvalue Xmin- If we let w £ be the extension 
of V to M”* with zeros in the other coordinates, w'^Bw = v'^Av = Xmin- This implies that the 
smallest eigenvalue of B is at most Xmin and the result follows. □ 

Combining Corollary 5.3 and Proposition 5.4, if M' is PSD then M is PSD, as needed. □ 


6 Johnson scheme 


Association schemes is a classical area in combinatorics and coding theory (cf. for instance [vLWOl]). 
We shall use a few classical results (lemmas 6.6, 6.7 below), about the eigenspaces and eigenvalues 
of association schemes and the Johnson scheme in particular. We also introduce two bases for the 
Johnson scheme, which will play a key role in bounding the eigenvalues of various matrices later. 

We start with some basics about the Johnson scheme - some of our notations are non-standard 
but they fit better with the rest of the manuscript. 

Definition 6.1 (Set-Symmetry). A matrix M £ is set-symmetric if for every I,J £ 

(M^^ M{I, J) depends only on the size of |/ n J|. 

Definition 6.2 (Johnson Scheme). For n,r < n/2, let J = J7n,r ^ 5e the subspaee of 

all set-symmetric matrices. J is called the Johnson scheme. 

As we will soon see, is also a commutative algebra. There is a natural basis for the subspace 

J: 

Definition 6.3 (D-Basis). For 0 < £ < r < n, let Di = Dn^r,e G be defined bip 


De{I, J) 


1 |/nJ|=^ 

0 otherwise. 


( 6 . 1 ) 


For example, Dq is the well-studied disjointness matrix. Clearly, {D^ : 0 < £ < r} span the 
subspace J7. Also, it is easy to check that the Dfs and hence all the matrices in J7, commute with 
one another. 

Another important collection of matrices that come up naturally while studying PSD’ness of 
set-symmetric matrices is the following which gives a basis of PSD matrices for the Johnson scheme. 

Definition 6.4 (P-Basis). For 0 <t < r, let Pt = Pn,r,t G be defined by^^ 




( 1 , 0 . 1 ) 


Equivalently, for T C [n], if we let Pt be the PSD rank one matrix 


PT = t{{I:IF[n],IDT})-l{{I:IC[n],IDT})U 


®We will often omit the subscripts n, r. 
^®We will often omit the subscripts n, r. 
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then 


(6.2) 


P, = Ft. 

T:TC[n],\T\=t 

The equivalence of the above two definitions follows from a simple calculation: there is a non-zero 
contribution to {I, J)’th entry from the T’th summand from Equation 6.2 if and only if T C / n J. 
Clearly, Pt ^ 0 for 0 < t < r. We will exploit this relation repeatedly by expressing matrices in 
J as linear combinations of Pt’s. The following elementary claim relates the two bases {[D()) and 
{{Pt)) for fixed n,r. 

Claim 6.5. For fixed n, r, the following relations hold: 

1. For t)<t<r, Pt = 

Tor 0 < ^ < r, 

Proof. The first relation follows immediately from the definition of Pt- The second relation follows 
from inverting the set of equations given in (1). □ 

The main nontrivial result from the theory of association schemes we use is the following char¬ 
acterization of the eigenspaces of matrices in J. The starting point for these characterizations is 
the fact that matrices in J' commute with one another and hence are simultaneously diagonalizable. 
We refer the reader to Section 7.4 in [God] (the matrices Pt in our notation correspond to matrices 
Ct in [God]) for the proofs of these results. 

Lemma 6.6. Fix n,r < n/2 and let J = J^{n,r) he the Johnson scheme. Then, for Pt as defined 

by Equation 6.2, there exist subspaces Vo,Vi,... ,Vr G ’■) that are orthogonal to one another 
such that: 

1. Vo )... ,Vr are eigenspaces for {Pt : 0 <t < r} and consequently for all matrices in J. 

2. For 0<j<r, dim{Vfi = (p - 

3. For any matrix Q ^ J, let \j{Q) denote the eigenvalue of Q within the eigenspace Vj. Then, 

Aj(Pi) = I ^ ^ (6.3) 

[0 j >t 

The above lemma helps us estimate the eigenvalues of any matrix in Q G 77 if we can write Q 
as a linear combination of the Pfis or Dfis. To this end, we shall also use the following estimate on 
the eigenvalues of such linear combinations. 

Lemma 6.7. Let Q = G J{n,r), and fit = where ai > 0. Then, for 
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Proof. By Claim 6.5, 


^a,D, = (g<-'>‘-'C)«) = ?« (g'-^>‘"C)“') 

Therefore, as Q and Pfs have common eigenspaces, by Lemma 6.6, 

MQ) < A, (e A^’.] s E AAy(p.)=E A • (";!; 0' C - i) ■ 

\ t / ij 7 


7 PSD’ness of the expectation matrices 


In the section we show that if r is not too large then the expectation matrix E = E[M'] is PSD with 
high minimal eigenvalne. As a warmnp, we first show that the expectation matrix Em = E[M] is 
PSD. We start by writing down Em- 

Claim 7.1. ForI,j£Q), and Em = HM], 


Em{I,J) 



(7.1) 


Proof. The claim follows from observing that for all I and J, lK[degG{I U J)] = ^ ' 

To see this, note that for all I and J there are |/uj|) sets of size 2r containing I U J and each 
is a clique with probability 2 1 2 J. □ 


The expectation matrix above is just a scalar multiple of Aicr (viewed as a matrix) as defined 
in Equation 2.1. Therefore, by Theorem 2.1, Em as defined above is PSD for r < min(/c,n — k). 
We give a simpler proof of this claim here for the case when r < min(|, n — k). 

Theorem 7.2. The matrix Em is positive definite for r < min(|,n — k). 

Proof. We will show this by writing Em as a suitable positive linear combination of the PSD 
matrices Pfs from Section 6. More concretely, for any aQ,... ,at > 0, we have 


0^^atPt = ^i^at 
t e=o \t=o 


Di. 


Now, let ei = Em {I, J) for any I and J with | / U J| =2r — I, i.e., 

-( 2 ’') . + . ^2r-^ 


eg = 2 


/ 2r \ 
\2r-l) 
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Then, Em = Yl^£=o Therefore, we will be done if we can find ats such that for every 0 < ^ < r, 

e( = Ylt=o it) ■ examining the first values of £, it is easy to guess what the at should be. First 
observe that = eo • ~ (”7^) / ■ Then, 


— k\ /k — 2r + t\ 
t )'[ i-t ) 

k — 2r + t\ kk — 2r + £ 

t Jv 

A £k-2r + £ 

t)\ ^ 

Therefore, = ^2^ {t)^t lemma now follows: 

Em = ^ eeDe = 

e=o £=o \t=o 

□ 



7.1 PSD’ness of E 


Now that we have shown that Em is positive definite when r is not too large, we use similar ideas 
to analyze E = E[M'], the expectation matrix we will actually be using. We begin by writing down 
E. 


Claim 7.3. For I,J £ ("), and E = E[M'], 

E{I, J) = 


n — |I U J| 
2r- |/U J| 


k \ 

|/uj|j _ 2 -^^- 


(|7UJ|) 


(Pn.|) 


(7.2) 


Proof. The claim follows from observing that for all I and J, conditioned on the edges in £{I) and 
£{J) being present, E[degG{I U J)] = ' 2 "(^2")+('20+('^')-('^2''') = ■ 2 "''^"('^ 2 ‘''). 

To see this, note that for all I and J there are | 7 uj|) containing / U J, 

and conditioned on the edges in £{I) and £{J) being present, each is a clique with probability 
2 “(2 )+( 2 ')+(' 2 ')“(' 2 '). Now note that |/| = I J| = r and — (2O T2(0 = —(2r^ —r) + (r^—r) = — 

Lemma 7 . 4 . If k < 3)!.2^-1 ^nd r < | then E is PSD with minimal eigenvalue k'^ 

Proof. By Equation 7.2, E = where ■ 2 ~^^~( 2 ), We next express E 


as a 


linear combination of Pfs: E = ^^atPt- By Claim 6.5, D£ = Y2t=ii~^y so 


a, = ^(- 1 )^-' 


e=o 


ei. 
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Now note that for all ^ > 1, ^ • ^-L+i ' 2^ ^ ' e^-i = 2 i-hfc-^r+£) ' If ^ 

then the terms in the sum for at increase geometrically by a factor of at least 3 and the sum will 

therefore be dominated by the last term. In particular, at > y. Thus, at > 0 for all t S [0,r] and 

1 . " '’V U . 

^ \ r J ( 2 ;) 

Since the Pt’s are PSD and = I, E is PSD with minimal eigenvalue , as needed. □ 



8 PSD’ness of dual certificate 


We are now ready to prove our main result. Theorem 1.5, with the aid of several technical results 
whose proof is deferred to Section 9 and Section 10. We prove Theorem 1.5 by showing that the 
matrix M will be PSD with high probability (Theorem 2.5). In turn, we show that M is PSD with 
high probability with our main technical lemma, which says that M' is PSD with high probability 
(this is sufficient by Lemma 5.1). 

Lemma 8.1 (Main Technical Lemma). For c a sufficiently large constant the following holds. 
The matrix M' G m( r )^( r ) defined by Equation 5.2 is positive definite with high probability, for 
k < 2“'^''(yTi/logn)^/''. 

To prove Lemma 8.1, we first decompose M' as M' = i? + L +A in Section 8.1. We then analyze 
L and A in Section 8.2 and Section 8.3 respectively. We put all the pieces together to show the 
PSD’ness of M' in Section 8.4. 

For the remainder of this section, we shall use the following additional notations; 


For 0 < i < r, let 




( 8 . 1 ) 


For 0 < i < r, let p{i) = 2 Then, for I,Je with |/n J| = i, p{i) is the probability 

that £{I U J) \ {£{!) U £{J)) C G. 

In the following we will adopt the convention that I,J,K denote elements of (^”^) and T,T' 
denote elements of 

/[n]\ f [^]'\ 

All matrices considered below will be over Ml >■ i \ r I unless otherwise specified. 

2 2 

We write A Kir B if there exist constants c, C such that B < A < B. 


8.1 Decomposition of M' 

For the proof of Lemma 8.1 we decompose the matrix M' as M' = E + L + IS., where (a) E = E[M'] 
is the expectation matrix; (b) L will be a “local” random matrix such that for sets /, J, L{I, J) 
only depends on the edges between the vertices of / U J and (c) A is a “global” error matrix whose 
entries are small in magnitude. 
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To this end, first observe that by Equation 7.2, for E = E[M'] 


E{I,J) =a(|/n J|). 

Now, define L S as foilows: for I,J^ 


L(/,J) 


«(l^ n J|) • if ni u J) \ {£{!) u T(J)) c G 

—a(|/nJ|) otherwise 


( 8 . 2 ) 


(8.3) 


Finally, define A = M' — E — L. We have already shown in Section 7 that E is PSD with minimal 
eigenvalue . There are now two remaining modular steps in the proof: 

1. We show that ||L|| is rf log n by exploiting the recursive structure of the matrix 

L and some careful trace calculations. This is the most technically intensive part of the proof. 

2. We then show that ||A|| is rf log n. This is done by first showing that each entry 

of A is small in magnitude and using Lemma 4.1. 

The next two subsections address these two steps with the corresponding technical elements 
dealt with in Section 9 and Section 10 respectively. 


8.2 Bounding the norm of the locally random matrix L 

In this subsection, we bound the norm of the matrix L. 

Lemma 8.2. For some constant C > 0, with probability at least 1 — 1/n over the random graph G, 

||L|| <0(l)-2^^'-A:2^-n^-i^. 

vn 


We will prove the lemma by further decomposing L according to the intersection sizes of the 
indexing sets and using the recursive structure of the matrix M'. To this end, we define the following 
closely related locally-random matrix. For a £ [r], let Ra G M(a)^(a) Le the matrix supported only 
on disjoint sets and defined as follows: for V,Wg 


Ra{V,W) 


'2“" - 1 if F n W = 0 and {{u, w} : v e V,w e W} C G 
< —1 if F n IF = 0 and {{u, tc} : u G F, ic G IT} 2 G . 

ifFnlF/0 


(8.4) 


In other words, for disjoint F, IF G (^”^) the i?a(F, lF)’th entry is essentially (up to a constant 
multiple) a shift of the indicator random variable which is 1 if all edges in F x IF are in G and 0 
otherwise. 

Note that E[i?a] = 0. The following technical claim proved in Section 9 bounds the norm of Ra- 
The proof relies on computing the trace of powers of Ra- 


Claim 8.3 (See Section 9). If n > 100, for all e G (0,1), Pr 



> 2F+2a+2 In (n)^« I 


< e- 
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2 2 
Note that 2“ n“ is an easy bound for ||-Ra|| (each entry of the matrix is at most 2“ in magnitude); 

the main advantage of the claim is the multiplicative factor. 

In the remainder of this section we use the recursive structure of the matrix L to prove Claim 8.2 

assuming the above claim. We first introduce some notation: 


For a matrix X G '-’’ 2 ''^ and 0 < i < min{ri,r 2 }, let X* G 

such that X^(I, J) = X{I, J) if | / n J| = i and 0 otherwise^^. 




’’ 2 ' be the matrix 


For a matrix X G 


>( 



be defined as follows: 


X«(I, J) 


x(/\ (/n J), J\ (/n J)) if|/nJ| = i 

0 otherwise 


(8.5) 


The next claim relates the norms of “lifts” of matrix R, Conceptually, bounding the norms 
of matrices with non-zero entries on intersecting indexing sets are reduced to that of the disjoint 
case. Note that the requirement R = R^ exactly captures the latter. 

Lemma 8.4. For 0 < z < mm{ri,r 2 } and R E \r 2 -i)^ if R = R? then < ( 7 ) ( 7 ) ' 

Proof. We partition the entries of as follows. 

Definition 8.5. For any X,Y,K such that X C [l,ri], Y C [l,r 2 ], and K C V{G) where \K\ = 
\X\ = |T| = i, let R^xyk matrix such that the following is true: 

= R{I\K,J\K) if K = {i, : X G X} = {jy : y G Y} where 
ii, • • • , iri are the elements of I in increasing order and ji, • • • ,jr 2 are the elements of J in 
increasing order. 

2. R^y r ~ 0 otherwise. 

Proposition 8 . 6 . For allX,Y,K, \\R^x]y,k\\ <11^11- 

(i) 

Proof. The nonzero part of Rxyk viewed as a submatrix of R, so it cannot have larger 

induced norm than R. □ 

Proposition 8.7. R^"^^ = YIxyk^^xyk- 

Proof. If i?W(/, J) = 0 then Ex,y,k J) = 0- If R^^{I,J) / 0 then |/n J| = i. This 

implies that K = {i^ : x G X} = {jy : y G Y} if and only if iF = / H J, X is the set of indices 
of K in I, and Y is the set of indices of K in J, which happens for precisely one X, Y, K. Thus, 
R^xyk^^ j) = J) for precisely one I, J, X and is 0 otherwise, so = YhxYK ^^XY 

needed. □ 

Proposition 8.8. ||i2W|| < J2x,y ^x]y,k\\- 

^^For this paper, we will only use the case where n = r 2 = r. We put in this extra generality with an eye towards 
future work. 
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Proposition 8.9. If Ki,K 2 are distinct subsets of V{G) of size x, (Ii, Ji) / 0, and 

*^ 2 ) / 0 then h / I 2 and Ji / J 2 . 

Proof. Assume that Ii = I 2 = I and let ii, - ■ ■ ,iri be the elements of I in increasing order. Then 
Ki = {ix ■ X ^ X} = K 2 . Contradiction. Following similar logic, we cannot have that Ji = J 2 
either. □ 

Proposition 8.10. For any X,Y C [l,n], || T.k ^x\y,k\\ ^ II^II- 

Proof. Note that we can permute the rows and columns of a matrix without affecting its induced 
norm. By Proposition 8.9, we can permute the rows and columns of ^^XYK block 

(i) 

form where each block is the nonzero part of Rxy K some K. For a matrix in block form, its 
norm is the maximum of the norms of the individual blocks, which by Proposition 8.6 is at most 
||i?||, as needed. □ 

With these results. Lemma 8.4 follows immediately. Plugging in Proposition 8.10 to Proposition 
8.8 gives ||i 2 b)|| < Y.x,y\\T.k Rx,y,k\\ ^ Ex,y ll^ll < ( 7 ) (?) Il-^ll> as needed. □ 

We now use the above statements to prove Lemma 8.2. 

Proof of Lemma 8.2. We claim that for 0 < i < r, and ccj as in Equation 8.1 

A' = a* • 4-y (8.6) 

To see the above, fix I, J S (7^) with |/ n J| = i and let P = / \ (/ n J), W = J \ (I n J). 
Observe that 

£{I U J) \ {£{!) U £{J)) = {{u, w} :v eV,w eW}. 

We cosider two cases as in the definition of L. 

Case 1. <f(/U J) \ {£{I)U£{J)) C G. Then, = Rr-i{V,W) = = (1 - 

p{i))/p{i). Equation 8.6 now follows from the first case of the definition of L. 

Case 2. £{IVJj) \ {£{I)Vj£{J)) g G. Then, Rf_^{I,J) = Rr-i{V,W) = -1. Equation 8.6 
now follows from the second case of the definition of L. 

Therefore, by Claim 8.3, Lemma 8.4 and Equation 8 . 6 , 

\\V\\ < 0(1) • 2 ^^' • 

The lemma now follows as L = Ei=o ^ 

8.3 Bounding the norm of the global error matrix A 

The main claim of this subsection is the following bound on the spectral norm of A. 

Lemma 8.11. For n > 02^'"^, with probability at least 1 — 1/n over the random graph G, 

||A|| < 2^"^ • 

Vn 
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The proof relies on the following bound on the individual entries of A. 

Lemma 8.12. For some universal constant C, and n > , with probability at least 1 — 1/n 

over the random graph G, for all I,J^ with i = \I Ci J\, 

vn 


Before proving the lemma, we first use it to bound ||A||. 

Proof of Lemma 8.11. Suppose that the conclusion of the previous lemma holds. Then, for any 


|A|| < j;|A(/,J)l =E E 

J i=0 J:|/n.7|=i 


< 


i=0 J:|/nJ|=i 

< 2C-‘t^-0°gn) J^fn/kfr 


^/n 


i=0 


< 


(log n)n^ 
\/n 


The lemma now follows from the above bound and Lemma 4.1. □ 

Proof of Lemma 8.12. Fix sets/, J with |In J| = i. Let .4. be the event that T(/U J)\(T(/)UT( J)) C 

G. 

Then, by the second case of Equation 8.3, conditioned on we have A(I, J) = 0. Thus, the 
claim holds trivially in this case. In the following we condition on A. Observe that 


E[M'(/, J) I .4] = E(I,J)/?r[A] = E(I,J)/p{i). 


We next use the following claim that deg(j(/ U J) is concentrated around its mean when condi¬ 
tioned on / U J being a clique. At a high level, this follows from the fact that conditioned on / U J 
being a clique, degc.(/ U J) can be written as a (structured) low-degree polynomial in the indicator 
variables of the edges not in / U J with small variance. We defer the proof to the appendix. 

Claim 8.13 (See Theorem 10.1 of the appendix). For some constant G > 0, 


Pr 


degG(/U J) -2-('’')+(% 0 . 


n — 2r A i 


> 2(ln{G/e)) n 




(I U J a clique) 


< e. 


As a consequence of the above claim we also get concentration for M'(I, J) \ A. This is because 
M'{I, J) I .4 is identically distributed as \ (/ U J a clique). Therefore, taking e = 

and applying a union bound over all sets /, J we get that with probability at least 1 — 1/n, for all 
/, J such that 8(1 VJ J) \ (8(1) U 8(J)) C G, and \I H J\ = i, 


M^(I,J) - I3(i)2- 


(t) 


(%-0 


n — 2r i 


< Gr2 


2r^ 


,2r-i 


(log n) 


n 


i-l/2 
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Finally, observe that 


j3{i)2 (^ 2 ')- 




n — 2r + i 


= a{i)/p{i), 


and conditioned on A, A{I,J) = M'{I,J) — a{\I n J\)/p{\I n J|). The lemma now follows by 
combining the above two bounds. □ 


8.4 Putting things together 

We now prove Lemma 8.1 and use it to prove our main results. 

Proof of Lemma 8.1. By Lemma 7.4, we have that E ^ k'^rfl. Therefore, by Lemma 8.2 and 

Lemma 8.11, with probability at least 1 — 2/n, the least eigenvalue of M' is at least 


Vn \ An 


> 0 , 


for k as in the statement of the lemma for a sufficiently big constant c. 


□ 


We bring the arguments from previous sections together to prove our main results Theorem 2.5 
and Theorem 1.5. 


Proof of Theorem 2.5. Follows immediately from Lemma 5.1 and Lemma 8.1. □ 

Proof of Theorem 1.5. Follows immediately from Lemma 1.8, Claim 2.2 and Theorem 2.5. □ 

Theorems 1.1 and 1.2 follow immediately from our PS(r)-refutation lower bound using standard 
arguments. We defer these to the appendix. 


9 Bounding norms of locally random matrices 

In this section we shall develop tools for bounding the norms of locally random matrices (recall their 
informal definition from Section 1.4 and more formal one in Section 8.2) associated with random 
graphs G •(— G{n, 1/2), proving Claim 8.3. The idea behind our bounds is to use the trace method. 
Recall the trace method: for any matrix M, for any positive integer q, \\M\\ < ^-^tr((M^M)'?) so 
we can probabilistically bound ||M|| by bounding E [tr((M^M)'^)]. 

Going back to Claim 8.3 let us first look at the special case of a = 1 to gain some intuition. 
In this case, the entries of Ri are (essentially) independent, and so the trace method is easy to 
apply. More precisely, Ri is a symmetric random matrix with zeros on the diagonal and the entries 
in the upper diagonal taking independent uniformly random ±I values. It is well known that 
||i?i|| = 0{An) in this case (see [Ver] for instance). One can also prove the bound by the trace 
method as follows. We have that 

r 2g 

E[tr{{RAR,y)]='E[tr{RA^)]= ^ E J] Ri(z,-, i,-+i) , 

h,---,i2q J = 1 

where i 2 q+i = h- We can then look at which products Ojli) L+i) have expectation 0. 
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Since each individual is an independent ±1 random variable with expectation 0, a 

*i+i) = 0 unless every ij+i) appears an even number 


term in the summation E 


of times in the product. Thus, the vast majority of the terms E 




are 0 and we 


can count the remaining terms to bound E 


tr{R'l 


2<J', 






*i+i) 


which 


One way to implement the above argument is to first look at terms E 
have non-zero expectation and observe that in all such terms, the number of distinct entries in 
{ij : j = l,...,2q} is at most q + 1. We can then bound the number of terms with non-zero 
expected value by the number of possible terms which contain at most q + 1 distinct elements. This 
number can be easily bounded by 0{{nqy~^^), and picking q optimally results in showing that with 
high probability ||i?i|| = 0{^/nlogn), a near-optimal bound. 

To handle higher a’s we first generalize the above argument based on constraint graphs to work 
with general locally-random matrices. However, unlike for a = 1, distinct entries of the matrix 
are now dependent, which significantly complicates the structure of the terms and the associated 
count of the terms which have non-zero expectation. The rest of the section is devoted to this. 
While we apply our arguments to the particular locally-random matrices arising in our proof, these 
techniques should apply more generally to other locally-random matrices. 


9.1 Constraint graphs 

We next state our main technical result which gives us a way to bound traces of high powers of 
locally random matrices based on the structure of the individual terms. The advantage being that 
the conditions on the terms will be easier to ascertain in our applications. 

Here we use V rather than I for subsets because we will be viewing the individual elements of 
each V as vertices. 


Theorem 9.1. Assume that we have values a,B > 0 and for every positive q, we have a function 
p{G, 2q) such that p{G, 2q) > 0 and p{G, 2q) can be written in the form 

p{G,2q)= f{G,{Vi,...,V 2 g}) 

{Vi,...,V2A 

where the following are true: 

1. Vj C V{G) and\Vj\ = a. 

2. For every term f{G, {Vi ,..., V 2 q}) with non-zero expected value, \ Uj V^j < 2aq — qy + z for 
some integers y and z where 1 <y <2a and z > 0. 


3. E[f{G,{Vu...,V2q})]<B^F 
Then, if n > 10, for all e € (0,1), 


Pr 


min { ^^p{G,2q)}\ > — ■ 
qez+ o! 



/ \a{n^/e) 

V 2y 



y 


< e. 


Remark 9.2. We will use this theorem with two types of functions p. When p{G, 2q) = tr{{M'^M^) 
for some matrix M depending on G, \\M\\ < ^^p{G, 2q) for all g > 0 so this theorem gives us a 
probabilistic bound on ||M||. Whenp(G, 2q) = h{G)‘^y for some function h, then h{G) = ^^p{G, 2q) 
for all g > 0 so this theorem gives us a probabilistic bound on h{G). 
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Example 9.3. In the case when p{G, 2q) = tr{R^), p{G, 2q) = Ojli Ri{ij^ij+i)- Each 

term here has expected value at most 1 and it is easy to argue that for any term with non-zero 
expected value, the number of distinct elements is at most g -|- 1. Applying Theorem 9.1 with 
y = z = 1, and B = 1 we have that for all n > 10, and e E (0,1), 

Pr [I |i2i 11 > 2e\/n(ln (n/e) -|- 2)] < e 

This bound is weaker (by a logarithmic factor) than the bounds in e.g. [Ver], but is sufficient for 
our purposes. 

Before proving the theorem we introduce the concept of constraint graphs which are a useful way 
to visualize our calculations. While the statement of the above theorem does not involve constraint 
graphs, thinking in terms of constraint graphs is helpful in proving the conditions required to apply 
the theorem. 

Definition 9.4. Given a family of sets of vertices {Vi}, we define a corresponding constraint graph 
G whose vertices are the sets {Vi} and there is an edge between Vi,Vj, i j, ifViCiVj 7 ^ 0. 

The above dehnition is useful because of the following elementary lemma. 

Lemma 9.5. For any collection of sets {Vi ,..., if the corresponding constraint graph G has t 
connected conaponents, then | LJ 2 k)| ^ 

Proof. Let ..., belong to the t different connected components of G. Now add the remaining 
elements of {Vi,..., V^} so that each new set is adjacent (in C) to at least one of the previously 
added sets (we can do this as the number of connected components is t). Then, each such step 
adding a set V) can increase the size of the union by at most |V)| — 1. Therefore, the size of the 
union is at most \Vi\ — £ + t. □ 

Proof of Theorem 9.1. In the following we use {Vi} as a short form for {V\,..., V 2 q}. We prove 
this result by obtaining an upper bound on the number of terms in p{G,2q) = /(G*, {Ej}) 

with nonzero expected value. This gives us a probabilistic upper bound for p{G, 2q), implying the 
upper bound on min^ { ^{/p{G, 2q)}. 

Definition 9.6. Define N{n,a,q,m) to be the number of ways to choose subsets {Vi : i G [2g]} of 
[n] such that | Uj V)| < m and for all i, \Vi\ = a. 

Lemma 9.7. If m < 2aq, then 

IV(n,a,g,m) < f 

\a!/ \2aq — mj 

Proof. We can choose each ordered 2og-tuple (ui,--- ,V 2 ag) of elements in [n] which contains at 
most m distinct elements as follows. There must be at least 2aq — m elements which are duplicates 
of other elements, so we can hrst choose a set I of 2aq — m indices such that for all i ^ I, Vi = Vj 
for some j ^ I. There are { 2 aq-m) choices for I. We then choose the elements {vj : j ^ I}. There 
are no restrictions on these elements so there are n™ choices for these elements. Finally, we choose 
the elements {vi : i G /}. To determine each Vi it is sufficient to specify the j ^ I such that 
Vi = Vj. For each i there are m choices for the corresponding j, so the number of choices for these 
elements is at most Putting everything together, the total number of choices is at most 
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™'. Now note that since we are choosing subsets {Vi : i G [2(7]} of [re] rather than 
one big or(iered tuple, the or(ier within each subset does not matter. Thus, there are (a!)^'^ different 
ordered tuples which give the same subsets of elements, so the total number of possibilities for the 
subsets {Vi} is at most needed. □ 

Now E[p(G, 2g)] = E[/(G, {14})]. For every nonzero term E[/(G, (V)})], we have that 

I Ui 14 ] < 2aq — qy + z. If g > ^ then applying Lemma 9.7 with m = 2aq — qy + z, the number of 
non-zero terms E[/(G, {I 4 })] is at most 

'' f 2ag \m^2aq-m < \2aqmf^‘^-^n'^. 

\2aq — mj V®-/ 



Moreover, by our assumptions, each of these nonzero terms £^[/(G, {I 4 })] has value at most 
so 


E[p(G,2g)] < (^) \2aqmf'^^-^n^B^‘>. 


Now, by Markov’s inequality applied to p{G, 2q), 


Pr 


Vp(G,2g) > V]E[p(G,2g)]/£ 


< s. 


We next choose a value q so as to minimize our estimate on 2g)]/e. Specifically, we set 

q = [ln(re^/e)/2y] (we arrive at this value by minimizing the general estimate as a function of q by 
setting the derivative to 0 - we spare the reader the details). As long as re > 10, this guarantees 
that q > z/y so that 


VlE[p(G,2g)]/£ < ^ ^ • {2aqmr-^/^^ • re™/^? 

= ^ • re“-^G . {2aqmy-^/‘^'^ 


< 


B 

a\ 


< 


B 

a! 


^a-y /2 



{2aqf 


n^-vG . . {2aY 


( ln(re^/e) 

V 2y 



The claim now follows by rearranging the above bound. 


□ 


9.2 Bounds on \\Ra 


In this subsection, we prove Claim 8.3 using Theorem 9.1. For convenience, we restate Claim 8.3 
here with more precise constants. 


Theorem 9.8. // re > 100, for all e G (0,1), Pr 



< e. 


The core of the proof will be to bound j 14^ | for any term 1 Ra{Vi- , 14j+i) with non-zero 
expectation which appear in the expansion of tr{{R^RaY). We will do so by arguing that the 
constraint graph associated with the term has at most 2aq — g -|- 1 connected components, which 
we do by inductively decomposing Ra as follows. 
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Definition 9.9. Given a partition {A,B) of [l,n], define Ra,A,B{Vi,V 2 ) = i?a(^i)V2) if Vi C A 
and V2 C -B and 0 otherwise. 

Proposition 9.10. Y.A,B^a,A,B = 

Proof. Ra,A,B{yi, V 2 ) = Ra{Vi, V2) = 0 whenever Vi and V2 are not disjoint. For all disjoint Vi and 
V 2 , Ra,A,B{yi, F2) = V2) for choices of A and B and is 0 for the rest. □ 

Corollary 9.11. llBall <2^°‘m.a.-KA,B{\\Ra,A,B\\} 

Proof. Since Ba = 2‘^°-~^Y,A,B^a,A,B, ||Ba|| < 5 ||iia,A,B|| < 2^°- maXA,B {\\Ra,A,B\\} □ 

Now given A and B, take 

9 

p{G,2q) = tr{{R^A,BRa,A,BY) = ^ V(i+I)i) 

{Vij:*S[l,q],jS[l,2]} *=1 
9 

= E n« a{Vil,Vi2)Ra{Vi2,Vii+l)l) 

{Vij:ie[l,q],je[l,2]}: i=l 
Vi.ViiCS 
Vi.VijCA 

where we take = Vn- 

To simplify this expression, rename the sets of vertices as follows. 

Definition 9.12. 

1. If i & [l,2g] and i is odd then take Wi = F^i+i)! 

2. If i £ [1,2(7] i is even then take Wi = F'(i)2 
We now have that 

2q 

p{G,2q) = E n Ra{Wi,Wi+i), (9.1) 

{Wi:iell,2q]}: j=l 

V odd i,W^CB 

V even i^W^QA 

where we take VF2q+i = VFi. To study which of these terms may have non-zero expectation, we 
first define a graph related to the corresponding constraint graph. 

Definition 9.13. Given a constraint graph C, let H be a graph with two types of edges, product 
edges and constraint edges, such that 

1. V{H) = {Wr.i€[l,2q\] 

2. Ep{H) = {(W^,Wi+l} : i G [l,2q]} 

3. Ec{H) = {(W„ Wfi ■.i^j,Wif\ Wj 0} 
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Now, each Ra{Wi,WiJ^i) is a random variable with expectation 0, so if any Ra{Wi.,WiJ^i) is 
independent from everything else, the product will have expectation 0. Such dependencies arise 
due to the presence of edges from G occurring in (at least two) different “elements” (say {Wi, Vbj+i), 
for i / j) of the term. Such repeated occurrences manifest in our constraint graphs 
(and the graph H defined above) as (three or four) cycles in the graph, which we call independence 
breaking. For a term to have non-zero expectation it must be that every element (Wj, VFi+i) is on 
some such cycle. This implies that each product Oili ^a{Wi, VFi+i) has zero expected value unless 
all of the product edges in the corresponding H are part of independence-breaking cycles. This 
places restrictions on H (see Lemma 9.17) which in turn places restrictions on the constraint graph 
C, allowing us to use Theorem 9.1. We make these ideas precise below. 

Definition 9.14. Given q and {Wi, ■ ■ ■ ,W 2 q}, we define Wi± 2 q = Wi for all i £ [ 1 , 2 ( 7 ]. 

Definition 9.15. If q >2, 

1. Define an independence breaking 3-cycle in H to consist of product edges iWi, Wi+i), (Wj+i, lFi+ 2 ) 
and a constraint edge {{Wi,j),{Wi+ 2 ,j))- 

2. Define an independence breaking f-cycle to consist of product edges ei = (VFi^, IFjj+i), 62 = 
{Wi 2 ,Wi 2 ±i) and constraint edges (PFj^jkFjj) and (Wj^+i, PFjjii). 

Proposition 9.16. For all Wi, • • • , W 2 q such that Wi C B whenever i is odd and Wi C A whenever 
i is even, if the corresponding H has a product edge {Wi,WiJ^i) which is not contained in any 
independence-breaking cycle then E [n?ll Ra{Wi,W,+i)]=Q 

Proof. If {Wi, VFj+i) is not contained in any independence-breaking cycle then no edge between Wi 
and Wj+i appears anywhere else so RaiWi,WiJ,-i) is a random variable with expectation 0 which 
is independent from everything else and thus £^[0?=! Ra{Wi, ITj+i)] = 0. □ 

We now bound the number of connected components in H with the following lemma. 

Lemma 9.17. Let q >2 and H be a graph such that 

1. Every product edge of H is contained in an independence-breaking cycle. 

2. Every constraint edge of H is of the form {Wi,Wi^j) where j is even. 

Then, the number of connected components in the graph defined by only the constraint edges of H 
is at most q -\-1. 

The intuitive idea behind this lemma is that if we add the constraint edges in the right order, 
every new constraint edge can put two product edges into independence breaking cycles. For exam¬ 
ple, a constraint edge between Wj_i and IFj+i puts the product edges {Wi-i, Wi) and {Wi, VFj+i) 
into an independence breaking 3-cycle. If we then add a constraint edge between Wi -2 and IFj+ 2 , 
this puts the product edges {Wi- 2 , kFi-i) and (VFj+i, IFi+ 2 ) into an independence breaking 4-cycle. 
The final constraint edge can put 4 product edges into independence breaking cycles, so the number 
of constraint edges needed is g — 1. 

To make this argument work, we use an inductive proof. We note that if there is no Wi which is 
isolated in H, we must have at least q constraint edges. On the other hand, if there a Wi which is 
isolated, there must be a constraint edge between Wi-i and Wj+i. As noted above, this constraint 
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edge puts the product edges {Wi-i,Wi) and into an independence breaking 3-cycle. 

We take this to be the first constraint edge. We then argue that we can essentially delete Wi 
and merge Wi-i and Wj+i which allows us to use the inductive hypothesis. We make these ideas 
rigorous below. 

Proof of Lemma 9.17. We prove Lemma 9.17 by induction on q. The base case g = 2 is trivial, as 
we clearly need at least one constraint edge, so the number of connected components in H is at 
most 3. Now assume that q = k >3 and the result is true for q = k — 1. 

First note that if there is no Wi which is isolated (when looking only at constraint edges), then 
there are at most q connected components in H. Thus, we may assume that Wi is isolated for some 

i. Now note that for the product edge {Wi-i,Wi), since Wi is isolated, there are no independence 
breaking 3-cycles or 4-cycles where Wi is the endpoint of a constraint edge. Thus, we must have 
that (Wj_i,lTj) is part of an independence breaking 3-cycle consisting of {Wi-i,Wi), {Wi,Wi+i)., 
and a constraint edge (Wj-i, VFj+i). 

Now form a new graph H' as follows. Delete Wi and contract the constraint edge between Wi-i 
and VFj+i. More precisely, 

1. Take V{H') = V{H) \ {W*_i, Wi, Wi+i} U {U} 

2. Take Eproduct{H') = Eproduct{H) \ {(Wj , Wj^i) : j € [z — 2, i -|- 1]} U {(Wi-2, U), {U, WiJ^2)'\ 

3. Take 

Econstraint{,E ) — Ef^Q^straintiE') \ •[(lFj_i, W)) . (lFj_i, W)) G Ef^Qnstrainti.E')} 

\ {(Wi+uWj) : {Wi+i,Wj) G E,onstraint{H)} 

U {(U,Wj) : {Wi—i,Wj) G EconstraintiH) Or (VFj+i,VFj) ^ ^constraint m 

After doing this, rename U as Wi-i and rename each Wj where j > i + 1 as Wj-2. In going from 
H to H', we have effectively reduced both q and the number of connected components by 1. To 
complete the proof, we need to check that H' satisfies the inductive hypotheses. Based on the 
reduction from H to H', we still have that every constraint edge is of the form {Wi, Wj+j) where j 
is even. We check that every product edge is still part of an independence-breaking cycle case by 
case. 

1. Every independence-breaking cycle which did not contain the constraint edge (lLi_i, Wj+i) 
in H is preserved in H' except that the vertices may have been renamed. The reason for this 
is that such an independence breaking cycle in H cannot contain Wi and can contain at most 
one of {Wi-i,Wi+i}. 

2. The independence-breaking 3-cycle in H consisting of the product edges {Wi-i, Wi), {Wi, Wj+i) 
and the constraint edge (kFi_i, Wj+i) is removed, but so are the product edges {Wi-i,Wi) 
and {Wi, Wi+i), so this is fine. 

3. If we have an independence breaking 4-cycle in H consisting of the product edges {Wi- 2 , Wi-i), 
(VEj+i, VFj+ 2 ) and the constraint edges {Wi-i, Wi+i), {Wi-2, kFi-i- 2 ), this becomes an independence¬ 
breaking 3-cycle in H' with product edges {Wi-2,Wi-i), {Wi-i,Wi) and a constraint edge 
{Wi-2,Wi) (note that Wi-i and kEj+i are merged into Wi-i in H' and 11^+2 is renamed as 

Wi in ff'). 
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H' satisfies the inductive hypotheses, so looking only at the constraint edges, H' has at most 
(g — 1) + 1 = q connected components. H has one more connected component than H' (the vertex 
Wi H)^ so H has at most q + 1 connected components, as needed. □ 

The above lemma combined with Lemma 9.5 gives the following corollary. 

Corollary 9.18. For all terms 0^=1^*+i) occurring in Equation 9.1 with nonzero expec¬ 
tation, I Wi\ < 2aq — {2q) + q+1. 

We can now prove Theorem 9.8 


Proof of Theorem 9.8. We can now apply Theorem 9.1 with y = 1, and z = 1 by the above corollary. 

2 2 

Every entry of Ra,A,B has magnitude at most 2“ so we can take i? = 2“ . By Theorem 9.1, if 
n > 10, for all A and B, for every e € (0,1), 


Pr 


2“^ / / 
||.^a,A,i3|| ^ ~qV ( 2e(2^ 


In n — In e 



= Pr 

' / 

_ 


2 “ ea 


Ra,A,B\\ > -—(Inn - Ine + 2)n“ 2 


< s 


Since 41nn > e(lnn + 2) for all n > 100 and a! > a, we have that for all n > 100, for all A and B 
and all e G (0,1), 


Pr 


\Ra,A,B\\>2-"+^ln{-)n--^2 


Now by Corollary 9.11, ||fia|| < 2^°- maxA ,b {\\Ra, A,b\\} so 


Pr 


\Ra\ \ > 2“'+2“+2ln(-)n“-5 


< e 


< e 


□ 


10 Concentration bounds for number of cliques and degdl) 

We now prove large deviation bounds for degd ) leading to Claim 8.13 which we state below in a 
more precise form. 

Theorem 10.1. If n > 10, and e G (0, 1), then for all I C [n], with \I\ = i < 2r, 


Pr 


degcil) - 2 "(" 2 ")+( 2 ) . 


n — i 
2r — i 


> 2(ln(128/e))^n^'’ * | (I is a clique) 


< e. 


To prove the claim we first show a similar concentration bound for the number of cliques of 
a certain size in G. While similar results appear in the literature, see for instance [Ruc88, VuOl, 
JLRll], we give a short direct proof based on Theorem 9.1. 

Definition 10.2. For a graph G, define Na{G) to be the number of a-cliques in G. 

Theorem 10.3. For all a, for all n > 10 and e G (0,1), E[Na{G)] = 2 ~^ 2 ) and 

Pr [|1V,(G) - E[Na{G)]\ > (ln(64/e))2 • n“-^] < e. 
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Proof. The first part of the theorem is trivial so we focus on the second part. Given a set of vertices 

-Gi -Gi 

V of size a, define cy to be 1 — 2 if 1/ is a clique and —2 GJ otherwise. Then, 

Na{G) - E[Na{G)] = Y, 

V:\V\=a 


Now let’s consider the function p{G, 2q) = {Y^v.\v\=a = Y^Wu-,W 2 g cWi- 

Note that cvkJ = 0 unless each set of vertices Wi has two vertices in common with a 

different set of vertices Wj. Now consider a graph G 2 where the vertices are {ITi,..., W 2 q} and 
an edge between Wi,Wj if \Wi n Wj\ > 2. Let t be the number of connected components in (72. 
We claim that | Uj lTi| < 2aq — 4q + 2t. For, as in the proof of Lemma 9.5, first consider elements 
VFq,..., VFjj belonging to the t different connected components. Now, add the remaining elements 
of {Wi ,..., W 2 q} so that each new element is adjacent to at least one of the previously added sets. 
When doing so, each step can increase the size of the union by at most a — 2. Therefore, the size 
of the union is at most at + {a — 2){2q — t) = 2aq — 4g + 2t. On the other hand, each connected 
component in G 2 must have at least two vertices, so t < q. Therefore, | Uj lLi| < 2aq — 2q. 

We can now apply Theorem 9.1 with y = 2, z = 0 and B = 1 so that for n > 10, and e G (0,1), 


Pr 


\Na{G) - E[Na{G)]\ > i - {ea(^ 


' — Ine 


+ 1 


n 


a—1 


< e. 


Using the facts that < 8 and ^ < 2 for all nonnegative integers m, we have that 


Pr [|iV,(G) - E[Na{G)]\ > (ln(64/e))2 • n“-^] < e. 


□ 


We are now ready to prove Theorem 10.1. The idea is as follows. Let Aj be the collection of 
vertices which are adjacent to all the vertices in I. Then, conditioned on I being a clique, degc{I) 
is just the number of cliques of size 2r — z in the vertices Aj which is primarily determined by \Ai\. 
This is because the edges between vertices of Ai are independent of the edges involving vertices in 
I so that we can apply Theorem 10.3 to Aj. 

Proof of Theorem 10.1. Let Aj be as above and let us condition on I being a clique. Then, degc{I) 
is just the number of cliques of size 2r — i among the vertices in Aj. Therefore, by Theorem 10.3, 
with probability at least 1 — e/2, 


f2r — i\ 

degcil) - 2 1 2 J 


l^/l 

2r — i 


< (ln(128/e))2 • 


We next argue that is concentrated around its mean. For j ^ I, let Xj be the indicator 

random variable that is 1 if the j’th vertex is adjacent to all the vertices in I and 0 otherwise. 
Then, \Ai\ = Yjfi^j 


l^/l 

2r — i 


JC[n]\I,\J\=2r—ij£J 
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Observe that the random variables Xj are independent of each other and that 


Elf({X, : J i /))] = 


We next apply McDiarmid’s inequality to the function /. Note that changing any single coordinate 
of the inputs to / can change its value by at most . Therefore, by Theorem 4.2, with 

probability at least 1 — e/2, 


l^/l 

2r — i 


2 ”*(2r—i) / ^ 

\2r-i) 


< Vln(4/e) • 


Combining the above equations, we get that with probability at least 1 — e. 


degcil) - ‘)-*( 2 r-*) 


n — i 
2r — i 


< (ln(128/e))2 • + 2-('"2 ') • v^ln(4/e) • < 

21n((128/e)2) 


The theorem now follows as (^'2 *) + i{2r — i) = ( 2 ^) — ( 2 )- 


□ 


11 Conclusion and future work 

In this work we showed a lower bound for the maximum clique problem on random G(n, 1/2) 
graphs in the SOS hierarchy and positivstellensatz proof system. Besides the specific application 
to clique lower bounds, the PSD’ness of the matrix M from Equation 2.5 seems to carry further 
information that could be potentially useful elsewhere, perhaps for studying various sub-graph 
statistics. Further, the arguments related to association schemes and bounding the norm of locally 
random matrices could also be useful elsewhere, especially for other SOS hierarchy lower bounds. 
One natural and interesting candidate is the densest subgraph problem. 

For planted clique itself, the most obvious open problem is to tighten the gap between the 
current upper bound of 0(y^/2^) and our lower bound of {^/n/lognY^^ for r rounds of the 

SOS hierarchy. In particular, can a constant number of rounds of SOS beat the square-root barrier 
and identify planted cliques of size o(re^/^)? Kelner^^ showed that our dual certificate M actually 
is not PSD for k roughly Thus one needs to come up with a different dual certificate 

to approach the upper bound of y/n even for r = 2. 
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12 Hierarchy Gaps and Positivstellensatz Refutations 

For a detailed discussion of the hierarchies and PS(r)-refutations we refer the reader to the dis¬ 
cussions in [OZ13]. The basic principle is that, typically, PS(r)-refutations are more robust and 
stronger than the hierarchy formulations. 

The SOS (or Lasserre) relaxation for maximum clique is stated in Figure 12 (cf. [Tul09]). 
Although, the formulation itself is not in terms of an SDP, it is a standard fact that as the program 
only involves inner products of vectors, the optimization can be done by semi-definite programming. 
The connection between Figure 12 and PS(r)-refutations comes from the following straightforward 
lemma stating that a certificate for PS(r)-refutations is simply a primal solution to the standard 
r-round SOS-relaxation of the problem. 
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SOS-relaxation for Max-Clique. Input: Graph G = 
SDP are vectors Us, where S C [n], !£ < r. 

(y, E), r - number of rounds. Variables of the 

maximize 


iev 


such that {U{i^, U^j^) = 0 , 

Vi,i, {i,j}^E 

{Us„Us,) = {Us„Us,), 

5i U £2 = £3 U £4, l£i U £ 2 ! < r 

{Usi,Us2) e [0,1], 

|Si|,|£ 2 | <r 




Figure 1: r-round SOS-relaxation for Maximum Clique 


Lemma 12.1. Let G = {V, E) be a graph and let Clique(G,/c) denote the clique axioms as defined 
by Equations 1.2. Suppose that there exists a dual certificate M. ■V{n, 2r) —>• M for Clique(G, k) as 
defined in Definition 1.7. Then, the value of the r-round SOS-relaxation for maximum elique given 
by Figure 12 is at least k. 

Proof. Let Ai :V{n, 2r) —)• M be the dual certificate and M G be the corresponding PSD 

matrix. Without loss of generality suppose that M(0,0) = 1. Let M = UUfi where U = 

for some N. Finally, for S G (<J), let Ug be the S'hh row of U. We claim that the collection {Us, 

I S'! < r) gives a feasible solution for the SDP in Figure 12. 

Observe that for any two subsets £'2 G (<]), 

{Us„Us,) = M{SuS2)=M{Xs,uS2)- 

Therefore, the vectors {Us ■ |£| < r) satisfy the first two constraints of Figure 12 as A1 is a dual 
certificate. Further, ||[/ 0 |p = M(0,0) = 1 and for any set S, 

\\Us\\l = {Us,Us) = {Us,U^) < \\Ush, 


so that \\Us\\ < 1. Thus, {Us- IS] < r) give a feasible solution for the program in Figure 12. 
Finally, the value of the solution is 


iev iev 


This proves the lemma. 

Our main theorems now follow. 


□ 


Proof of Theorem 1.1. Let G •(— G{n, 1/2). Then, from the above lemma and the proof of Theo¬ 
rem 1.5 (where we showed the existence of a dual certificate for the clique axioms), the value of the 
r-round SOS-relaxation for max-clique on G is at least n^/^^/C"’(log n)^/'’ with high probability. 
The claim follows as the integral value is (2 -|- o(l)) log 2 n with high probability. □ 
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Proof of Corollary 1.2. The value of the relaxation in Figure 12 is clearly monotone with respect to 
adding edges. Therefore, from the above argument, for G ■(— G{n, 1/2, t) the value of the r-round 
SOS-relaxation for max-clique on G is at least n^/^^/C'^(log with high probability. The claim 
follows as the integral value is t with high probability. □ 
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